Problem 8.3: σ factor activity, 30 pts extra credit

In this problem, we will fully work up the data we studied in Tutorial 9b. You downloaded the data via Dropbox. We will consider all images starting with snaps001, snaps002, and snaps003. Be sure to read the README.pdf file from Jin Park describing the data set.

a) Segment all images using the RFP channel, and then compute the mean intensity of each bacterium in the CFP and YFP channels. Show a representative segmentation (one each for the 001, 002, and 003 sets) by overlaying your binary segmented image on a phase image.

In [1]:
import warnings

# Workhorse
import numpy as np
import pandas as pd

# Find files
import glob
import os

# A whole bunch of skimage stuff
import skimage.feature
import skimage.filter
import skimage.filter.rank
import skimage.io
import skimage.morphology
import skimage.restoration
import skimage.segmentation
import skimage.transform

# Regionprops_to_df
import sys
sys.path.insert(0, '/Users/chigozie/git/regionprops_to_df/')
import regionprops_to_df

# And some useful scipy.ndimage stuff
import scipy.ndimage

# Import plotting tools
import matplotlib.pyplot as plt
import seaborn as sns

# Interaction
import ipywidgets

# Magic function to make matplotlib inline; other style specs must come AFTER
%matplotlib inline

# This enables high res graphics inline (only use with static plots (non-Bokeh))
# SVG is preferred, but there is a bug in Jupyter with vertical lines
%config InlineBackend.figure_formats = {'png', 'retina'}

# JB's favorite Seaborn settings for notebooks
rc = {'lines.linewidth': 2, 
      'axes.labelsize': 18, 
      'axes.titlesize': 18, 
      'axes.facecolor': 'DFDFE5'}
sns.set_context('notebook', rc=rc)
sns.set_style('darkgrid', rc=rc)

# Suppress future warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
//anaconda/lib/python3.4/site-packages/skimage/filter/__init__.py:6: skimage_deprecation: The `skimage.filter` module has been renamed to `skimage.filters`.  This placeholder module will be removed in v0.13.
  warn(skimage_deprecation('The `skimage.filter` module has been renamed '
//anaconda/lib/python3.4/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))
In [2]:
# The directory containing daytime data
data_dir = '../../bebi103/data/park_et_al/'

# Glob string for images
im_list = []
for i in range(1,4):
    im_glob = os.path.join(data_dir, 'snaps00%s*.tif' % i)
    im_list += glob.glob(im_glob)

# Let's look at the first 10 entries
im_list[:10]
Out[2]:
['../../bebi103/data/park_et_al/snaps001-001-c.tif',
 '../../bebi103/data/park_et_al/snaps001-001-p.tif',
 '../../bebi103/data/park_et_al/snaps001-001-r.tif',
 '../../bebi103/data/park_et_al/snaps001-001-y.tif',
 '../../bebi103/data/park_et_al/snaps001-002-c.tif',
 '../../bebi103/data/park_et_al/snaps001-002-p.tif',
 '../../bebi103/data/park_et_al/snaps001-002-r.tif',
 '../../bebi103/data/park_et_al/snaps001-002-y.tif',
 '../../bebi103/data/park_et_al/snaps001-003-c.tif',
 '../../bebi103/data/park_et_al/snaps001-003-p.tif']
In [3]:
# Make data frame containing images
df = pd.DataFrame(columns=['strain', 'field_of_view', 'channel', 'filepath', 'image', 'histogram'])
df.filepath = im_list
df.strain = [i[-13:-10] for i in df.filepath]
df.field_of_view = [i[-9:-6] for i in df.filepath]
df.channel = [i[-5] for i in df.filepath]
df.image = [skimage.io.imread(i) for i in df.filepath]
df.histogram = [skimage.exposure.histogram(im) for im in df.image]
# Take a look
df.head()
Out[3]:
strain field_of_view channel filepath image histogram
0 001 001 c ../../bebi103/data/park_et_al/snaps001-001-c.tif [[441, 421, 443, 458, 422, 424, 399, 427, 423,... ([1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 4, 3,...
1 001 001 p ../../bebi103/data/park_et_al/snaps001-001-p.tif [[1321, 1364, 1293, 1313, 1297, 1338, 1347, 13... ([1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 2, 0, 1, 0, 0,...
2 001 001 r ../../bebi103/data/park_et_al/snaps001-001-r.tif [[243, 261, 250, 243, 250, 273, 247, 262, 256,... ([1, 0, 0, 1, 0, 4, 2, 3, 10, 7, 16, 10, 36, 5...
3 001 001 y ../../bebi103/data/park_et_al/snaps001-001-y.tif [[385, 338, 369, 392, 366, 377, 353, 371, 374,... ([1, 1, 1, 1, 3, 5, 6, 5, 8, 20, 16, 23, 22, 3...
4 001 002 c ../../bebi103/data/park_et_al/snaps001-002-c.tif [[429, 430, 406, 413, 408, 433, 427, 412, 422,... ([1, 1, 0, 1, 1, 3, 3, 1, 7, 5, 6, 15, 14, 18,...
In [4]:
# Upsample p, c, y (2 means 2x as big, order=0 means no interpolation)
upsample_list = [scipy.ndimage.zoom(im, 2, order=0) for im in df.loc[df.channel!='r','image']]
df.loc[df.channel!='r','image'] = pd.Series(upsample_list, index=df[df.channel!='r'].index)
In [5]:
# So we have it, the interpixel distance
physical_size = 0.065  # microns

# Look at all the strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 4, figsize=(12,40))
    ax[0,0].set_title('Phase')
    ax[0,1].set_title('RFP (segment)')
    ax[0,2].set_title('CFP')
    ax[0,3].set_title('YFP')
    for i, _ in enumerate(df.index[(df.channel=='c') & (df.strain=='001')]):
        ax[i,0].imshow(df.loc[df.channel=='p','image'].values[i], cmap=plt.cm.viridis)
        ax[i,1].imshow(df.loc[df.channel=='r','image'].values[i], cmap=plt.cm.viridis)
        ax[i,2].imshow(df.loc[df.channel=='c','image'].values[i], cmap=plt.cm.viridis)
        ax[i,3].imshow(df.loc[df.channel=='y','image'].values[i], cmap=plt.cm.viridis)
        
        ax[i,0].set_ylabel(df.loc[df.channel=='c','filepath'].values[i][-13:-4])
        #ax[1].imshow(df.image[0][subim], cmap=plt.cm.viridis)
In [6]:
# Zoom in
subim = np.s_[100:300, 850:1050]
# Look at all the strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 4, figsize=(12,40))
    ax[0,0].set_title('Phase')
    ax[0,1].set_title('RFP (segment)')
    ax[0,2].set_title('CFP')
    ax[0,3].set_title('YFP')
    for i, _ in enumerate(df.index[(df.channel=='c') & (df.strain=='001')]):
        ax[i,0].imshow(df.loc[df.channel=='p','image'].values[i][subim], cmap=plt.cm.viridis)
        ax[i,1].imshow(df.loc[df.channel=='r','image'].values[i][subim], cmap=plt.cm.viridis)
        ax[i,2].imshow(df.loc[df.channel=='c','image'].values[i][subim], cmap=plt.cm.viridis)
        ax[i,3].imshow(df.loc[df.channel=='y','image'].values[i][subim], cmap=plt.cm.viridis)
        
        ax[i,0].set_ylabel(df.loc[df.channel=='c','filepath'].values[i][-13:-4])
        #ax[1].imshow(df.image[0][subim], cmap=plt.cm.viridis)
In [7]:
# Look at all the histograms for strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 4, figsize=(12,40))
    ax[0,0].set_title('Phase')
    ax[0,1].set_title('RFP (segment)')
    ax[0,2].set_title('CFP')
    ax[0,3].set_title('YFP')
    for i, _ in enumerate(df.index[(df.channel=='c') & (df.strain=='001')]):
        hist_p, bins_p = df.loc[df.channel=='p','histogram'].values[i]
        hist_r, bins_r = df.loc[df.channel=='r','histogram'].values[i]
        hist_c, bins_c = df.loc[df.channel=='c','histogram'].values[i]
        hist_y, bins_y = df.loc[df.channel=='y','histogram'].values[i]
        
        ax[i,0].set_ylabel(df.loc[df.channel=='c','filepath'].values[i][-13:-4])
        ax[i,0].fill_between(bins_p, hist_p, cmap=plt.cm.viridis)        
        ax[i,1].fill_between(bins_r, hist_r, cmap=plt.cm.viridis)        
        ax[i,2].fill_between(bins_c, hist_c, cmap=plt.cm.viridis)        
        ax[i,3].fill_between(bins_y, hist_y, cmap=plt.cm.viridis)
    
    # Make log scale
    for i in ax:
        for j in i:
            j.set_yscale('log')

There are clearly some strange images here. For example, the phase of 006 is unimodal. Looking back at the image itself, we can see that there was inversion of the image. We need to deal with this somehow.

Let's try a very simple segmentation first. It looks from the histograms like values below about 500 in the RFP channel are background.

In [8]:
# Make threshold greater than 500
df.thresh = None
df.loc[df.channel=='r','thresh'] = [df.loc[(df.channel=='r'),'image'].values[i]>500 for i, _ in enumerate(df.loc[(df.channel=='r'),'image'])]

# Look at all the strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 2, figsize=(12,40))
    ax[0,0].set_title('Threshold (>500)')
    ax[0,1].set_title('RFP (segment)')
    for i, _ in enumerate(df.index[(df.channel=='c') & (df.strain=='001')]):
        ax[i,0].imshow(df.loc[df.channel=='r','thresh'].values[i][subim], cmap=plt.cm.viridis)
        ax[i,1].imshow(df.loc[df.channel=='r','image'].values[i][subim], cmap=plt.cm.viridis)
        
        ax[i,0].set_ylabel(df.loc[df.channel=='c','filepath'].values[i][-13:-4])

This is OK, but clearly not perfect. Instead, let's try the thresholding function from Tutorial 9:

In [9]:
# Reuse thresholding function from Tutorial 9
def bebi103_thresh(im, selem, white_true=True, k_range=(0.5, 1.5), 
                   min_size=100):
    """
    Threshold image as described above.  Morphological mean filter is 
    applied using selem.
    """
    
    # Determine comparison operator
    if white_true:
        compare = np.greater
        sign = -1
    else:
        compare = np.less
        sign = 1
    
    # Do the mean filter
    im_mean = skimage.filter.rank.mean(im, selem)

    # Compute number of pixels in binary image as a function of k
    k = np.linspace(k_range[0], k_range[1], 100)
    n_pix = np.empty_like(k)
    for i in range(len(k)):
        n_pix[i] = compare(im, k[i] * im_mean).sum() 

    # Compute rough second derivative
    dn_pix_dk2 = np.diff(np.diff(n_pix))

    # Find index of maximal second derivative
    max_ind = np.argmax(sign * dn_pix_dk2)

    # Use this index to set k
    k_opt = k[max_ind - sign * 2]

    # Threshold with this k
    im_bw = compare(im, k_opt * im_mean)

    # Remove all the small objects
    im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=min_size)
   
    return im_bw, k_opt, im_mean
In [10]:
# Threshold
df.thresh = None
selem = skimage.morphology.disk(10)
df.loc[df.channel=='r','thresh'] = [bebi103_thresh(df.loc[(df.channel=='r'),'image'].values[i], selem) for i, _ in enumerate(df.loc[(df.channel=='r'),'image'])]
//anaconda/lib/python3.4/site-packages/skimage/filters/rank/generic.py:68: UserWarning: Bitdepth of 11 may result in bad rank filter performance due to large number of bins.
  "performance due to large number of bins." % bitdepth)
In [11]:
# Look at all the strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 2, figsize=(12,40))
    ax[0,0].set_title('Threshold (sophisticated)')
    ax[0,1].set_title('RFP (segment)')
    for i, _ in enumerate(df.index[(df.channel=='c') & (df.strain=='001')]):
        ax[i,0].imshow(df.loc[df.channel=='r','thresh'].values[i][0][subim], cmap=plt.cm.viridis)
        ax[i,1].imshow(df.loc[df.channel=='r','image'].values[i][subim], cmap=plt.cm.viridis)
        
        ax[i,0].set_ylabel(df.loc[df.channel=='c','filepath'].values[i][-13:-4])

Pretty good! There are some small regions which are false positives, though. We should increase the "min_size" parameter.

In [12]:
# Threshold
df.thresh = None
selem = skimage.morphology.disk(5)
df.loc[df.channel=='r','thresh'] = [bebi103_thresh(df.loc[(df.channel=='r'),'image'].values[i], selem, min_size=200) for i, _ in enumerate(df.loc[(df.channel=='r'),'image'])]

# Look at all the strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 2, figsize=(12,80))
    ax[0,0].set_title('Threshold (sophisticated)')
    ax[0,1].set_title('RFP (segment)')
    for i, _ in enumerate(df.index[(df.channel=='c') & (df.strain=='001')]):
        ax[i,0].imshow(df.loc[df.channel=='r','thresh'].values[i][0][subim], cmap=plt.cm.viridis)
        ax[i,1].imshow(df.loc[df.channel=='r','image'].values[i][subim], cmap=plt.cm.viridis)
        
        ax[i,0].set_ylabel(df.loc[df.channel=='c','filepath'].values[i][-13:-4])
//anaconda/lib/python3.4/site-packages/skimage/filters/rank/generic.py:68: UserWarning: Bitdepth of 11 may result in bad rank filter performance due to large number of bins.
  "performance due to large number of bins." % bitdepth)
In [13]:
def zero_crossing_filter(im, thresh):
    """
    Returns image with 1 if there is a zero crossing and 0 otherwise.
    
    thresh is the the minimal value of the gradient, as computed by Sobel
    filter, at crossing to count as a crossing.
    """
    # Square structuring element
    selem = skimage.morphology.square(3)
    
    # Do max filter and min filter
    im_max = scipy.ndimage.filters.maximum_filter(im, footprint=selem)
    im_min = scipy.ndimage.filters.minimum_filter(im, footprint=selem)
    
    # Compute gradients using Sobel filter
    im_grad = skimage.filter.sobel(im)
    
    # Return edges
    return (((im >= 0) & (im_min < 0)) | ((im <= 0) & (im_max > 0))) \
                & (im_grad >= thresh)
In [14]:
df.image.values[2].astype(float)
Out[14]:
array([[ 243.,  261.,  250., ...,  401.,  363.,  339.],
       [ 241.,  251.,  246., ...,  405.,  352.,  318.],
       [ 249.,  249.,  236., ...,  402.,  358.,  340.],
       ..., 
       [ 223.,  219.,  219., ...,  218.,  211.,  219.],
       [ 227.,  216.,  215., ...,  226.,  213.,  215.],
       [ 233.,  205.,  228., ...,  197.,  225.,  220.]])
In [15]:
im_float = df.image.values[2].astype(float)
im_LoG = scipy.ndimage.filters.gaussian_laplace(im_float, 2.0)
In [16]:
plt.imshow(im_LoG[subim], cmap=plt.cm.RdBu_r)
Out[16]:
<matplotlib.image.AxesImage at 0x14535c198>
In [17]:
def edgethresh(threshold):
    im_edge = zero_crossing_filter(im_LoG, threshold)
    plt.imshow(im_edge[subim], cmap=plt.cm.RdBu_r)
ipywidgets.interact(edgethresh, threshold=(0,10,0.001))
Out[17]:
<function __main__.edgethresh>
In [18]:
df['im_float'] = [im.astype(float) for im in df.image]
In [19]:
# Compute LoG
# im_LoG = scipy.ndimage.filters.gaussian_laplace(im_float, 2.0)
im_LoG_list = []
im_edges_list = []
for i, _ in enumerate(df.loc[(df.channel=='r'),'im_float']):
    im_LoG = scipy.ndimage.filters.gaussian_laplace(df.loc[(df.channel=='r'),'im_float'].values[i], 2.0)
    im_LoG_list.append(im_LoG)
    im_edges_list.append(zero_crossing_filter(im_LoG, 3))
# pandas doesn't like setting a column to a list of arrays, so I first make the list into a pandas Series:
df.loc[df.channel=='r','LoG'] = pd.Series(im_LoG_list, index=df[df.channel=='r'].index)
# Skeletonize edges
im_edges_list = [skimage.morphology.skeletonize(im) for im in im_edges_list]
df.loc[df.channel=='r','edges'] = im_edges_list

# Fill holes
df.loc[df.channel=='r','segmented'] = [scipy.ndimage.morphology.binary_fill_holes(im) for im in df.loc[df.channel=='r','edges']]

# Remove small objects that are not bacteria
df.loc[df.channel=='r','segmented'] = [skimage.morphology.remove_small_objects(im, min_size=100) for im in df.loc[df.channel=='r','segmented']]

# Clear border with large buffer size b/c LoG procedure came off border
df.loc[df.channel=='r','segmented'] = [skimage.segmentation.clear_border(im, buffer_size=5) for im in df.loc[df.channel=='r','segmented']]

# df.loc[df.channel=='r','edges'] = [scipy.ndimage.filters.gaussian_laplace(df.loc[(df.channel=='r'),'image'].values[i], 2.0) for i, _ in enumerate(df.loc[(df.channel=='r'),'image'])]
In [20]:
plt.imshow(df['segmented'][2], cmap=plt.cm.viridis)
Out[20]:
<matplotlib.image.AxesImage at 0x14af14208>
In [21]:
plt.imshow(skimage.measure.label(df['segmented'][2], background=0))
Out[21]:
<matplotlib.image.AxesImage at 0x14a8a58d0>
In [22]:
# Label binary image; background kwarg says value in im_bw to consider backgr.
labeled_list = [skimage.measure.label(im, background=0) for im in df.loc[df.channel=='r','segmented']]
# pandas doesn't like setting a column to a list of arrays, so I first make the list into a pandas Series:
df.loc[df.channel=='r','labeled'] = pd.Series(labeled_list, index=df[df.channel=='r'].index)
In [23]:
# Look at all the strain 001 images
with sns.axes_style('white'):
    fig, ax = plt.subplots(len(df.index[(df.channel=='c') & (df.strain=='001')]), 3, figsize=(12,80))
    ax[0,0].set_title('Original image')
    ax[0,1].set_title('Edges')
    ax[0,2].set_title('Labeled')
    for i, _ in enumerate(df.index[(df.channel=='r') & (df.strain=='001')]):
        ax[i,0].imshow(df.loc[df.channel=='r','image'].values[i][subim], cmap=plt.cm.viridis)
        ax[i,1].imshow(df.loc[df.channel=='r','edges'].values[i][subim], cmap=plt.cm.viridis)
        ax[i,2].imshow(df.loc[df.channel=='r','labeled'].values[i][subim], cmap=plt.cm.viridis)
        
        ax[i,0].set_ylabel(df.loc[df.channel=='r','filepath'].values[i][-13:-4])
In [24]:
# Get regionprops for all images,
# using labeled segmentation of corresponding red channel
regionprops_df = pd.DataFrame()
for i in df.index:
    regionprops = skimage.measure.regionprops\
    (df.loc[(df.strain==df.ix[i]['strain'])
            & (df.field_of_view==df.ix[i]['field_of_view'])
            & (df.channel=='r'),'labeled'].values[0],
     intensity_image=df.ix[i]['image'])
    regionprops_df_temp = regionprops_to_df.regionprops_to_df(regionprops)
    regionprops_df_temp['strain'] = df.ix[i]['strain']
    regionprops_df_temp['field_of_view'] = df.ix[i]['field_of_view']
    regionprops_df_temp['channel'] = df.ix[i]['channel']
    
    regionprops_df = regionprops_df.append(regionprops_df_temp, ignore_index=True)
regionprops_df
Out[24]:
area bbox centroid convex_area eccentricity equivalent_diameter euler_number extent filled_area inertia_tensor_eigvals ... min_intensity minor_axis_length orientation perimeter solidity weighted_centroid weighted_local_centroid strain field_of_view channel
0 595 (30, 1227, 73, 1266) (50.6067226891, 1248.12941176) 778 0.962780 27.524126 0 0.354800 595 (227.270610174, 16.6033535292) ... 493 16.298885 -0.805615 139.124892 0.764781 (50.5097956966, 1248.42317849) (20.5097956966, 21.4231784878) 001 001 c
1 355 (35, 1042, 54, 1073) (44.3070422535, 1057.16056338) 374 0.900218 21.260293 0 0.602716 355 (66.1251776566, 12.5378653905) ... 395 14.163539 -0.326992 78.426407 0.949198 (44.1363993241, 1057.14676719) (9.13639932406, 15.146767191) 001 001 c
2 573 (37, 410, 71, 452) (55.8551483421, 429.767888307) 706 0.955512 27.010484 0 0.401261 573 (184.595554382, 16.0590816695) ... 543 16.029514 -0.624356 121.639610 0.811615 (56.2053050562, 430.365255731) (19.2053050562, 20.3652557306) 001 001 c
3 415 (37, 938, 73, 960) (54.265060241, 949.371084337) 458 0.951941 22.986831 0 0.523990 415 (112.012085028, 10.5076652989) ... 458 12.966212 -1.171984 93.840620 0.906114 (53.7419682724, 949.291029876) (16.7419682724, 11.2910298762) 001 001 c
4 551 (50, 261, 84, 302) (66.891107078, 280.735027223) 614 0.963577 26.486883 0 0.395265 551 (172.133468778, 12.3107787707) ... 376 14.034688 0.655382 113.195959 0.897394 (66.6023238605, 280.674467449) (16.6023238605, 19.6744674486) 001 001 c
5 587 (60, 597, 100, 641) (80.6354344123, 618.921635434) 652 0.976458 27.338464 0 0.333523 587 (230.114673145, 10.707095499) ... 455 13.088679 0.736847 124.166522 0.900307 (80.3381646187, 619.116453447) (20.3381646187, 22.1164534471) 001 001 c
6 404 (67, 620, 101, 654) (84.051980198, 637.361386139) 439 0.975024 22.680141 0 0.349481 404 (152.632254118, 7.52899845489) ... 432 10.975608 0.779177 99.781746 0.920273 (83.8248482092, 637.266062695) (16.8248482092, 17.266062695) 001 001 c
7 530 (76, 132, 107, 170) (90.7641509434, 151.052830189) 617 0.942096 25.977239 0 0.449915 530 (138.368274231, 15.5601024154) ... 431 15.778518 -0.605513 105.882251 0.858995 (90.4415322885, 150.70562752) (14.4415322885, 18.7056275204) 001 001 c
8 467 (83, 708, 122, 727) (102.349036403, 717.102783726) 500 0.941804 24.384480 0 0.630229 467 (113.197765175, 12.7918995954) ... 473 14.306306 -1.330341 95.597980 0.934000 (102.038611079, 717.027372683) (19.0386110785, 9.02737268292) 001 001 c
9 448 (98, 135, 138, 150) (117.814732143, 142.582589286) 482 0.947727 23.883285 0 0.746667 448 (115.140055741, 11.722816947) ... 440 13.695440 1.475540 94.284271 0.929461 (117.081632072, 142.542646339) (19.0816320721, 7.54264633883) 001 001 c
10 390 (109, 1350, 134, 1380) (121.728205128, 1364.08717949) 405 0.925994 22.283703 0 0.520000 390 (83.6426042819, 11.922076849) ... 443 13.811344 -0.620549 83.740115 0.962963 (121.821353858, 1364.47881306) (12.8213538576, 14.4788130564) 001 001 c
11 434 (114, 470, 141, 505) (127.410138249, 486.564516129) 480 0.952461 23.507147 0 0.459259 434 (118.249466068, 10.9756236559) ... 505 13.251792 -0.566071 95.982756 0.904167 (127.119707845, 486.306743798) (13.1197078447, 16.3067437983) 001 001 c
12 404 (132, 977, 147, 1020) (139.017326733, 998.148514851) 459 0.972932 22.680141 0 0.626357 404 (147.184423869, 7.86005093748) ... 510 11.214313 0.125609 99.698485 0.880174 (139.110325288, 997.859936071) (7.11032528773, 20.8599360715) 001 001 c
13 564 (133, 1263, 187, 1287) (159.191489362, 1272.46099291) 783 0.970602 26.797520 0 0.435185 564 (253.236567779, 14.6702779528) ... 498 15.320720 1.293267 135.811183 0.720307 (158.779756116, 1272.6773337) (25.7797561155, 9.67733370097) 001 001 c
14 292 (141, 1278, 167, 1300) (153.931506849, 1288.75) 319 0.926379 19.281752 0 0.510490 292 (63.70912541, 9.0353271139) ... 599 12.023528 0.949558 73.254834 0.915361 (153.905466717, 1288.52810602) (12.9054667171, 10.5281060183) 001 001 c
15 340 (143, 915, 175, 937) (158.605882353, 925.979411765) 381 0.953867 20.806284 0 0.482955 340 (92.7838211679, 8.36336741337) ... 529 11.567795 1.092726 84.669048 0.892388 (158.451480374, 926.176137402) (15.4514803738, 11.1761374024) 001 001 c
16 1007 (146, 927, 189, 997) (171.296921549, 957.205561072) 1886 0.847252 35.807153 0 0.334551 1007 (350.921586598, 99.0175090473) ... 573 39.803017 -0.057563 248.036580 0.533934 (170.835428441, 956.865234029) (24.8354284408, 29.8652340293) 001 001 c
17 723 (147, 950, 172, 1019) (156.951590595, 984.63208852) 924 0.985699 30.340603 0 0.419130 723 (385.769104355, 10.954880731) ... 518 13.239263 0.215267 161.740115 0.782468 (156.513569954, 985.260982081) (9.51356995447, 35.2609820814) 001 001 c
18 714 (158, 398, 217, 425) (187.910364146, 413.602240896) 961 0.972443 30.151170 0 0.448211 714 (311.400596968, 16.9261533403) ... 422 16.456563 1.296233 153.260931 0.742976 (186.076568168, 414.059274706) (28.0765681676, 16.0592747065) 001 001 c
19 322 (169, 1279, 197, 1298) (183.313664596, 1287.94099379) 335 0.905188 20.248040 0 0.605263 322 (61.0661983552, 11.0306921968) ... 497 13.284994 -1.160791 74.183766 0.961194 (182.726691834, 1287.83479162) (13.7266918344, 8.83479162318) 001 001 c
20 507 (176, 638, 203, 679) (188.510848126, 657.978303748) 554 0.960137 25.407331 0 0.457995 507 (149.475171643, 11.6795653918) ... 577 13.670152 0.449260 107.982756 0.915162 (188.356032115, 657.430284763) (12.3560321145, 19.4302847629) 001 001 c
21 635 (184, 899, 230, 937) (207.404724409, 918.395275591) 684 0.972802 28.434259 0 0.363272 635 (225.247480434, 12.086018233) ... 451 13.905980 0.929001 125.923882 0.928363 (206.816831695, 918.69456317) (22.8168316952, 19.6945631705) 001 001 c
22 341 (190, 621, 222, 636) (205.275659824, 628.325513196) 368 0.919343 20.836859 0 0.710417 341 (70.8524613241, 10.9685240476) ... 520 13.247505 -1.418073 77.698485 0.926630 (204.703947894, 628.297108922) (14.7039478936, 7.29710892227) 001 001 c
23 614 (212, 1039, 256, 1079) (234.232899023, 1059.20521173) 650 0.973513 27.960134 0 0.348864 614 (222.183866067, 11.6139169335) ... 400 13.631679 0.845640 124.166522 0.944615 (233.988618973, 1059.21856627) (21.9886189735, 20.2185662679) 001 001 c
24 875 (226, 18, 284, 59) (257.790857143, 39.3131428571) 1248 0.915061 33.377906 0 0.367956 875 (278.240820032, 45.2596664986) ... 547 26.910122 1.112899 207.835570 0.701122 (257.051966352, 39.0680258818) (31.0519663522, 21.0680258818) 001 001 c
25 483 (228, 442, 251, 490) (238.97515528, 465.776397516) 544 0.978344 24.798683 0 0.437500 483 (193.791848599, 8.30246360559) ... 516 11.525598 -0.322391 116.325902 0.887868 (238.944335099, 465.656267344) (10.9443350994, 23.6562673441) 001 001 c
26 488 (232, 900, 268, 931) (249.463114754, 915.553278689) 510 0.952434 24.926711 0 0.437276 488 (129.970643653, 12.0702391742) ... 406 13.896900 -0.886642 99.539105 0.956863 (249.102905498, 915.375545367) (17.1029054976, 15.3755453665) 001 001 c
27 279 (236, 50, 259, 75) (246.878136201, 62.2078853047) 301 0.937511 18.847648 0 0.485217 279 (66.1035041207, 8.00330334578) ... 516 11.316044 0.720562 72.083261 0.926910 (247.160059674, 61.6198006914) (11.1600596742, 11.6198006914) 001 001 c
28 507 (243, 440, 266, 488) (254.043392505, 463.662721893) 601 0.976909 25.407331 0 0.459239 507 (206.152296935, 9.41056462094) ... 445 12.270657 -0.293451 118.083261 0.843594 (253.739399446, 463.325486173) (10.739399446, 23.3254861732) 001 001 c
29 266 (251, 55, 273, 79) (261.958646617, 67.037593985) 279 0.922529 18.403307 0 0.503788 266 (56.672589518, 8.44082842536) ... 502 11.621242 0.755798 68.083261 0.953405 (261.946738247, 66.6191738815) (10.9467382473, 11.6191738815) 001 001 c
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17166 532 (329, 813, 356, 855) (341.998120301, 833.979323308) 566 0.960648 26.026207 0 0.469136 532 (156.342898991, 12.0626849874) ... 833 13.892551 -0.464172 109.396970 0.939929 (341.360587548, 833.66667345) (12.3605875482, 20.66667345) 003 020 y
17167 769 (360, 1243, 407, 1275) (383.921976593, 1259.0793238) 1000 0.827855 31.290913 0 0.511303 769 (152.674627915, 48.0399102439) ... 658 27.724332 -1.203319 178.793939 0.769000 (382.052633464, 1259.7129717) (22.0526334645, 16.7129716962) 003 020 y
17168 568 (374, 218, 393, 269) (383.991197183, 243.422535211) 762 0.957742 26.892379 0 0.586171 568 (211.755253734, 17.5185975601) ... 639 16.742090 -0.001886 135.296465 0.745407 (383.380756219, 243.037323839) (9.38075621883, 25.0373238386) 003 020 y
17169 483 (415, 1114, 460, 1128) (437.401656315, 1120.71221532) 517 0.961725 24.798683 0 0.766667 483 (144.259519502, 10.831736408) ... 868 13.164641 1.509066 103.112698 0.934236 (436.906851058, 1121.10866385) (21.9068510582, 7.10866384633) 003 020 y
17170 493 (492, 526, 539, 543) (515.30020284, 534.679513185) 549 0.967171 25.054083 0 0.617021 493 (160.043645222, 10.3355293477) ... 1096 12.859567 1.427463 107.597980 0.897996 (514.882135429, 534.786241108) (22.8821354287, 8.78624110796) 003 020 y
17171 435 (512, 1073, 549, 1097) (530.393103448, 1084.73793103) 469 0.950276 23.534213 0 0.489865 435 (113.964202004, 11.0516679924) ... 637 13.297620 1.077542 94.568542 0.927505 (529.890594558, 1085.12717168) (17.8905945578, 12.1271716822) 003 020 y
17172 478 (524, 912, 560, 942) (542.223849372, 927.09832636) 514 0.954273 24.669992 0 0.442593 478 (131.452616434, 11.7470211778) ... 579 13.709571 0.926088 99.882251 0.929961 (541.813591128, 927.346157286) (17.8135911278, 15.3461572856) 003 020 y
17173 372 (567, 309, 589, 340) (577.932795699, 324.177419355) 390 0.928365 21.763389 0 0.545455 372 (81.3571755565, 11.238550828) ... 617 13.409579 -0.456795 82.669048 0.953846 (577.520772133, 323.981226557) (10.5207721333, 14.9812265572) 003 020 y
17174 423 (616, 1051, 657, 1062) (635.95035461, 1056.06382979) 428 0.963156 23.207333 0 0.937916 423 (128.050543413, 9.26206661681) ... 791 12.173457 -1.565476 94.384776 0.988318 (635.615785302, 1056.56009066) (19.6157853021, 5.56009065839) 003 020 y
17175 377 (621, 1306, 651, 1328) (635.607427056, 1316.93103448) 409 0.908285 21.909160 0 0.571212 377 (74.0032904765, 12.9518981198) ... 897 14.395498 1.065493 81.254834 0.921760 (635.282088508, 1317.25288712) (14.2820885081, 11.2528871196) 003 020 y
17176 778 (629, 789, 690, 829) (659.042416452, 808.122107969) 939 0.982691 31.473487 0 0.318852 778 (375.060474029, 12.8714025778) ... 472 14.350695 1.022088 166.509668 0.828541 (656.283670137, 809.849229904) (27.2836701371, 20.849229904) 003 020 y
17177 454 (635, 349, 673, 374) (653.964757709, 361.257709251) 482 0.953646 24.042686 0 0.477895 454 (122.966147007, 11.1358004405) ... 796 13.348139 -1.106632 97.982756 0.941909 (653.605979139, 361.105838127) (18.6059791394, 12.1058381274) 003 020 y
17178 481 (668, 384, 704, 414) (685.432432432, 398.810810811) 509 0.955034 24.747287 0 0.445370 481 (132.604283073, 11.657334054) ... 488 13.657135 -0.931285 100.468037 0.944990 (685.034609938, 398.70682656) (17.0346099384, 14.7068265597) 003 020 y
17179 682 (716, 1219, 738, 1270) (726.38856305, 1244.18475073) 740 0.962994 29.467768 0 0.607843 682 (208.281686798, 15.1299733188) ... 489 15.558907 0.214114 124.426407 0.921622 (726.193355547, 1244.66579171) (10.1933555466, 25.6657917142) 003 020 y
17180 714 (744, 523, 803, 540) (773.515406162, 530.970588235) 766 0.975507 30.151170 0 0.711864 714 (267.385381345, 12.9377459454) ... 555 14.387631 1.483886 134.526912 0.932115 (773.492607124, 531.005306547) (29.4926071237, 8.00530654686) 003 020 y
17181 888 (749, 817, 812, 873) (779.488738739, 845.386261261) 1031 0.987939 33.624942 0 0.251701 888 (508.897758903, 12.2017903926) ... 551 13.972424 -0.833332 177.764502 0.861300 (781.423094731, 847.372152395) (32.4230947305, 30.3721523954) 003 020 y
17182 627 (764, 1004, 794, 1048) (781.578947368, 1028.36682616) 874 0.911930 28.254578 0 0.475000 627 (179.22845488, 30.1789966176) ... 704 21.974165 0.337974 147.124892 0.717391 (781.168330094, 1029.46359812) (17.1683300941, 25.4635981234) 003 020 y
17183 287 (778, 824, 802, 846) (789.81184669, 835.083623693) 295 0.914303 19.115955 0 0.543561 287 (57.2607941937, 9.39367532759) ... 966 12.259641 -0.868834 68.669048 0.972881 (789.480449703, 835.393459909) (11.4804497032, 11.3934599089) 003 020 y
17184 500 (786, 931, 809, 972) (796.798, 951.244) 562 0.955741 25.231325 0 0.530223 500 (141.950629458, 12.2870305421) ... 1103 14.021144 0.358398 104.083261 0.889680 (796.434223198, 951.581772536) (10.434223198, 20.5817725356) 003 020 y
17185 373 (804, 90, 836, 112) (820.0, 100.651474531) 401 0.937857 21.792621 0 0.529830 373 (87.7173491781, 10.5633255914) ... 1172 13.000508 1.083543 84.669048 0.930175 (819.875732952, 100.391609376) (15.8757329518, 10.3916093756) 003 020 y
17186 503 (863, 59, 909, 77) (885.626242545, 68.1192842942) 556 0.965951 25.306906 0 0.607488 503 (161.232411851, 10.792591212) ... 807 13.140832 1.385149 108.183766 0.904676 (885.382542459, 68.017820222) (22.3825424592, 9.01782022196) 003 020 y
17187 415 (882, 654, 918, 674) (899.265060241, 664.612048193) 460 0.935784 22.986831 0 0.576389 415 (96.9536046746, 12.0521378134) ... 627 13.886476 -1.262658 89.840620 0.902174 (899.047009578, 664.75952059) (17.0470095781, 10.7595205898) 003 020 y
17188 455 (894, 1284, 914, 1321) (903.701098901, 1302.27472527) 496 0.938042 24.069150 0 0.614865 455 (107.351283699, 12.8904938641) ... 705 14.361334 0.242892 94.183766 0.917339 (903.581982302, 1302.62497853) (9.58198230248, 18.6249785269) 003 020 y
17189 331 (931, 505, 962, 518) (946.716012085, 510.945619335) 353 0.919076 20.529060 0 0.821340 331 (68.2898281932, 10.6054173778) ... 797 13.026384 -1.519280 76.870058 0.937677 (946.780195386, 511.071012544) (15.7801953857, 6.07101254416) 003 020 y
17190 445 (947, 362, 966, 401) (956.101123596, 381.13258427) 488 0.954566 23.803185 0 0.600540 445 (122.248544053, 10.8562356466) ... 645 13.179521 -0.256542 97.355339 0.911885 (955.926058801, 380.909230345) (8.92605880131, 18.9092303454) 003 020 y
17191 279 (949, 897, 972, 917) (960.096774194, 906.64874552) 282 0.863749 18.847648 0 0.606522 279 (44.4648447555, 11.2912991789) ... 871 13.441011 -0.888600 63.840620 0.989362 (960.016625088, 906.922955908) (11.0166250882, 9.92295590786) 003 020 y
17192 453 (981, 109, 994, 152) (987.116997792, 130.27593819) 481 0.959404 24.016193 0 0.810376 453 (131.744196803, 10.4794386129) ... 576 12.948784 -0.034567 98.284271 0.941788 (987.01869444, 129.983451261) (6.01869444018, 20.9834512607) 003 020 y
17193 449 (983, 1180, 1009, 1216) (996.113585746, 1197.60579065) 504 0.939597 23.909926 0 0.479701 449 (109.557543981, 12.8354004984) ... 673 14.330611 -0.454502 96.568542 0.890873 (996.160222738, 1197.98577783) (13.1602227378, 17.9857778293) 003 020 y
17194 496 (991, 46, 1009, 91) (999.806451613, 68.0322580645) 598 0.948309 25.130197 0 0.612346 496 (144.563688207, 14.5591005548) ... 486 15.262556 0.073963 108.911688 0.829431 (999.444587326, 67.7810638702) (8.44458732586, 21.7810638702) 003 020 y
17195 679 (993, 1052, 1020, 1102) (1007.30191458, 1076.089838) 761 0.959416 29.402885 0 0.502963 679 (200.648823683, 15.955929473) ... 605 15.977950 -0.329699 125.396970 0.892247 (1007.70183878, 1077.46879978) (14.7018387785, 25.4687997799) 003 020 y

17196 rows × 25 columns

In [25]:
# For my representative segmentations,
# instead of converting the image to RGB
# and colouring one of the channels with the segment,
# I will overlay the segment on top of the phase image
# with the alpha of the background of the segment image
# set to zero. This allows me to keep the colour information
# of the labels.
overlay_cmap = plt.cm.rainbow
overlay_cmap._init()
overlay_cmap._lut[0,-1] = 0

with sns.axes_style('white'):
    fig, ax = plt.subplots(3,1, figsize=(20,20))
    plt.suptitle('Representative segmentations:\n\
    The phase image of the first frame of each strain,\n\
    with labeled segmentations overlaid on top', size=20)
    for i in range(3):
        strain = '00' + str(i+1)
        ax[i].set_title(df.loc[(df.channel=='p') & (df.strain==strain),'filepath'].values[0])
        ax[i].imshow(df.loc[(df.channel=='p') & (df.strain==strain),'image'].values[0])
        ax[i].imshow(df.loc[(df.channel=='r') & (df.strain==strain),'labeled'].values[0], cmap=overlay_cmap)

b) As discussed in class, we are not entirely sure of how to analyze the correlation or lack thereof of the two different σ factors. Generate plots that you think may be useful and/or develop metrics for correlation among the samples. This question is very open-ended and is your chance to experience thinking about real results with the members of your group.

In [31]:
for i in range(3):
    strain = ['001','002','003'][i]
    cval = regionprops_df.loc[(regionprops_df.strain==strain) & (regionprops_df.channel=='c'),'mean_intensity']
    yval = regionprops_df.loc[(regionprops_df.strain==strain) & (regionprops_df.channel=='y'),'mean_intensity']
    color = 'rgb'[i]
    _ = plt.scatter([cval],[yval], color=color, alpha=0.2)
    # Pearson correlation
    r, _ = scipy.stats.pearsonr(cval,yval)
    print('Pearson r for strain', strain,'=', r)
_ = plt.legend(['Strain ' + str(i+1) for i in range(3)])
_ = plt.xlabel('CFP intensity')
_ = plt.ylabel('YFP intensity')
Pearson r for strain 001 = -0.190100729214
Pearson r for strain 002 = 0.945969486087
Pearson r for strain 003 = 0.874669562172

It seems clear that YFP and CFP intensities are correlated with each other far more in strains 2 and 3 than in strain 1.

But is this small or large? Let's take a cue from Costes' method in the colocalisation tutorial, and randomise pairs of mean intensities.

In [110]:
# Shuffle only within each field of view, as exposure differences between fields of view would artificially lower the correlation.
for shuffle_number in range(100):
    shuffle_number_column_heading = 'shuffle_{:03d}'.format(shuffle_number)
    for strain in regionprops_df.strain.unique():
        for field_of_view in regionprops_df.loc[(regionprops_df.strain==strain),'field_of_view'].unique():
            listtoshuffle = regionprops_df.loc[(regionprops_df.strain==strain) & (regionprops_df.field_of_view==field_of_view) & (regionprops_df.channel=='c'),'mean_intensity'].values
            np.random.shuffle(listtoshuffle)
            regionprops_df.loc[(regionprops_df.strain==strain) & (regionprops_df.field_of_view==field_of_view) & (regionprops_df.channel=='c'),shuffle_number_column_heading] = listtoshuffle
#regionprops_df.loc[(regionprops_df.strain=='002') & (regionprops_df.channel=='c')]
In [109]:
df_pearson = pd.DataFrame()
for shuffle_number in range(2):
    shuffle_number_column_heading = 'shuffle_{:03d}'.format(shuffle_number)
    for strain in regionprops_df.strain.unique():
        cval = regionprops_df.loc[(regionprops_df.strain==strain) & (regionprops_df.channel=='c'),shuffle_number_column_heading]
        yval = regionprops_df.loc[(regionprops_df.strain==strain) & (regionprops_df.channel=='y'),'mean_intensity']

        # Pearson correlation
        r, _ = scipy.stats.pearsonr(cval,yval)
        df_pearson_temp = pd.DataFrame(columns=['strain', 'r'])
        df_pearson_temp.strain = pd.Series(strain)
        df_pearson_temp.r = pd.Series(r)
        df_pearson = df_pearson.append(df_pearson_temp)
for strain in regionprops_df.strain.unique():
    print('The Mean Pearson r coefficient for randomised pairings of strain', strain, 'is', np.mean(df_pearson.loc[df_pearson.strain==strain,'r']), 'with s.d.', np.std(df_pearson.loc[df_pearson.strain==strain,'r']))
The Mean Pearson r coefficient for randomised pairings of strain 001 is 0.036448282763733436 with s.d. 0.019689145935413346
The Mean Pearson r coefficient for randomised pairings of strain 002 is 0.017945091014042073 with s.d. 0.013456801273329464
The Mean Pearson r coefficient for randomised pairings of strain 003 is 0.0029713858540773037 with s.d. 0.026165234791282535

What does this mean? Recall from tutorial 9a:

There are four different strains. We have

  1. $\sigma_B$-CFP, $\sigma_W$-YFP, constitutive RFP
  2. $\sigma_B$-CFP, $\sigma_B$-YFP, constitutive RFP
  3. $\sigma_W$-CFP, $\sigma_W$-YFP, constitutive RFP
  4. only constitutive RFP; no CFP or YFP

The central question is: do the $\sigma$ factors "share" the available RNA polymerases at the same time, or does one $\sigma$ factor get all of the polymerases while the other does not. We have found that the intensity of CFP and YFP in each cell in strain 1 appears to be uncorrelated (compare to randomised pairs, and contrast with strain 2 and 3), which suggests that whether or not $\sigma_B$ is using RNA polymerase in a cell is independent of whether $\sigma_W$ is.